End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

نویسندگان

  • Hai Xuan Pham
  • YuTing Wang
  • Vladimir Pavlovic
چکیده

We present a deep learning framework for realtime speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and head rotations to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual information and affective states within the speech. Hence, our model not only activates appropriate facial action units at inference to depict different utterance generating actions, in the form of lip movements, but also, without any assumption, automatically estimates emotional intensity of the speaker and reproduces her ever-changing affective states by adjusting strength of facial unit activations. For example, in a happy speech, the mouth opens wider than normal, while other facial units are relaxed; or in a surprised state, both eyebrows raise higher. Experiments on a diverse audiovisual corpus of different actors across a wide range of emotional states show interesting and promising results of our approach. Being speaker-independent, our generalized model is readily applicable to various tasks in human-machine interaction and animation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech-driven 3d facial animation for mobile entertainment

This paper presents an entertainment-oriented application for mobile service, which generates customized speech-driven 3D facial animation and delivers to end-user by MMS (Multimedia Messaging Service). Some important methods of this application are discussed, including the 3D facial model based on 3 photos, the 3D facial animation driven by speech or text on-line and the video format transform...

متن کامل

Learning-Based Facial Animation

This thesis proposes a novel approach for automated 3D speech animation from audio. An end-to-end system is presented which undergoes three principal phases. In the acquisition phase, dynamic articulation motions are recorded and amended. The learning phase studies the correlation of these motions in their phonetic context in order to understand the visual nature of speech. Finally, for the syn...

متن کامل

End-to-end Audiovisual Speech Recognition

Several end-to-end deep learning approaches have been recently presented which extract either audio or visual features from the input images or audio signals and perform speech recognition. However, research on end-to-end audiovisual models is very limited. In this work, we present an end-toend audiovisual model based on residual networks and Bidirectional Gated Recurrent Units (BGRUs). To the ...

متن کامل

Vision Based Speech Animation Transferring with Underlying Anatomical Structure

We present a novel method to transfer speech animation recorded in low resolution videos onto realistic 3D facial models. Unsupervised learning is utilized on a speech video corpus to find underlying manifold of facial configurations. K-means clustering is applied on the low dimensional space to find key speaking-related facial shapes. With a small set of laser scanner captured 3D models relate...

متن کامل

Visual speech synthesis from 3D video

Data-driven approaches to 2D facial animation from video have achieved highly realistic results. In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance. Animation from real speech is performed by path optimisation over a graph representation of phonetically segmented captured 3D video. A novel similarity m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.00920  شماره 

صفحات  -

تاریخ انتشار 2017